Robust subgroup discovery

نویسندگان

چکیده

We introduce the problem of robust subgroup discovery, i.e., finding a set interpretable descriptions subsets that 1) stand out with respect to one or more target attributes, 2) are statistically robust, and 3) non-redundant. Many attempts have been made mine either locally subgroups tackle pattern explosion, but we first address both challenges at same time from global modelling perspective. First, formulate broad model class lists, ordered sets subgroups, for univariate multivariate targets can consist nominal numeric variables, including traditional top-1 discovery in its definition. This novel allows us formalise optimal using Minimum Description Length (MDL) principle, where resort Normalised Maximum Likelihood Bayesian encodings targets, respectively. Second, lists is NP-hard. Therefore, propose SSD++, greedy heuristic finds good guarantees most significant found according MDL criterion added each iteration. In fact, gain shown be equivalent one-sample proportion, multinomial, t-test between dataset marginal distributions plus multiple hypothesis testing penalty. Furthermore, empirically show on 54 datasets SSD++ outperforms previous methods terms quality, generalisation unseen data, list size.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Subgroup Discovery

The discovery of (interesting) subgroups has a high practical relevance in all domains of science or business. For example, consider statements such as: ”the unemployment rate is above average for young men with a low educational level”, ”smokers with a positive family history are at a significantly higher risk for coronary heart disease”, or ”single males living in rural areas do rarely take o...

متن کامل

Subgroup discovery

The discovery of (interesting) subgroups has a high practical relevance in all domains of science or business. For example, consider statements such as: ”the unemployment rate is above average for young men with a low educational level”, ”smokers with a positive family history are at a significantly higher risk for coronary heart disease”, or ”single males living in rural areas do rarely take o...

متن کامل

Subgroup Discovery Method SUBARP

This paper summarizes the subgroup discovery method SUBARP. The discussion is to provide an intuitive understanding of the ideas underlying the method, and mathematical details are omitted.

متن کامل

Contrasting Subgroup Discovery

Department of Computer Science and Helsinki Institute for Information Technology HIIT, University of Helsinki, Finland Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia International Postgraduate School Jožef Stefan, Ljubljana, Slovenia Department of Biotechnology and Systems Biology, National Institute of Biology, Ljubljana, Slovenia Email: {laura.langohr, hannu...

متن کامل

Subgroup Discovery – Advanced Review

Subgroup discovery is a broadly applicable descriptive data mining technique for identifying interesting subgroups according to some property of interest. This article summarizes fundamentals of subgroup discovery, before it reviews algorithms and further advanced methodological issues. In addition, we briefly discuss tools and applications of subgroup discovery approaches. In that context, we ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Data Mining and Knowledge Discovery

سال: 2022

ISSN: ['1573-756X', '1384-5810']

DOI: https://doi.org/10.1007/s10618-022-00856-x